A Statistical Corpus-Based Term Extractor
نویسندگان
چکیده
Term extraction is an important problem in natural language processing. In this paper, we propose a language independent statistical corpus-based term extraction algorithm. In previous approaches, evaluation has been subjective, at best relying on a lexicographer’s judgement. We evaluate the quality of our term extractor by assessing its predictiveness on an unseen corpus using perplexity. Second, we evaluate the precision and recall of our extractor by comparing the Chinese words in a segmented corpus with the words extracted by our system.
منابع مشابه
Comparative Evaluation of C-value in the Treatment of Nested Terms
In statistical term extraction systems the identification and selection of nested term candidates often presents a challenge. The paper presents an implementation and evaluation of C-value, a heuristic that ranks and/or discards nested terms according to their stability in the corpus. The method was tested for English and Slovene, for both the overall performance of the term extractor improved ...
متن کاملAn Efficient Patent Keyword Extractor As Translation Resource
The paper addresses the issue of resource reuse in patent translation. It presents an efficient patent keyword/phrase extraction tool and illustrates how the tool can be used in patent translation by both human experts and MT developers. The keyword extraction is based on a new hybrid methodology providing for intelligent output and computationally attractive properties. The tool is composed of...
متن کاملImproving Term Extraction with Terminological Resources
Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. Facing the difficulty or impossibility to customize existing tools, we developed a tunable term extractor. It exploits linguistic-based rules in combination with the reuse of existing terminologies, i.e. exogenous disambiguation. Experiments reported...
متن کاملConcept Mining: A Conceptual Understanding based Approach
Due to the daily rapid growth of the information, there are considerable needs to extract and discover valuable knowledge from data sources such as the World Wide Web. Most of the common techniques in text mining are based on the statistical analysis of a term either word or phrase. These techniques consider documents as bags of words and pay no attention to the meanings of the document content...
متن کاملDesign of a Extraction System for Definitional Contexts from Biomedical Corpora
In this paper we show a general advance about the desgin of a methodology for extracting definitional contexts from corpus of biomedicine in Spanish, taking into account a set of processes performed by the following modules: (i) a term extractor based in a hybrid method, (ii) a set of verbs that configure the syntactic structure of a definitional context, (iii) a chunker able to recognize those...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001